The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-ofmodel-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.
translated by 谷歌翻译
Consider $n$ points independently sampled from a density $p$ of class $\mathcal{C}^2$ on a smooth compact $d$-dimensional sub-manifold $\mathcal{M}$ of $\mathbb{R}^m$, and consider the generator of a random walk visiting these points according to a transition kernel $K$. We study the almost sure uniform convergence of this operator to the diffusive Laplace-Beltrami operator when $n$ tends to infinity. This work extends known results of the past 15 years. In particular, our result does not require the kernel $K$ to be continuous, which covers the cases of walks exploring $k$NN-random and geometric graphs, and convergence rates are given. The distance between the random walk generator and the limiting operator is separated into several terms: a statistical term, related to the law of large numbers, is treated with concentration tools and an approximation term that we control with tools from differential geometry. The convergence of $k$NN Laplacians is detailed.
translated by 谷歌翻译
Recently, Person Re-Identification (Re-ID) has received a lot of attention. Large datasets containing labeled images of various individuals have been released, allowing researchers to develop and test many successful approaches. However, when such Re-ID models are deployed in new cities or environments, the task of searching for people within a network of security cameras is likely to face an important domain shift, thus resulting in decreased performance. Indeed, while most public datasets were collected in a limited geographic area, images from a new city present different features (e.g., people's ethnicity and clothing style, weather, architecture, etc.). In addition, the whole frames of the video streams must be converted into cropped images of people using pedestrian detection models, which behave differently from the human annotators who created the dataset used for training. To better understand the extent of this issue, this paper introduces a complete methodology to evaluate Re-ID approaches and training datasets with respect to their suitability for unsupervised deployment for live operations. This method is used to benchmark four Re-ID approaches on three datasets, providing insight and guidelines that can help to design better Re-ID pipelines in the future.
translated by 谷歌翻译
Even though deep neural networks (DNNs) achieve state-of-the-art results for a number of problems involving genomic data, getting DNNs to explain their decision-making process has been a major challenge due to their black-box nature. One way to get DNNs to explain their reasoning for prediction is via attribution methods which are assumed to highlight the parts of the input that contribute to the prediction the most. Given the existence of numerous attribution methods and a lack of quantitative results on the fidelity of those methods, selection of an attribution method for sequence-based tasks has been mostly done qualitatively. In this work, we take a step towards identifying the most faithful attribution method by proposing a computational approach that utilizes point mutations. Providing quantitative results on seven popular attribution methods, we find Layerwise Relevance Propagation (LRP) to be the most appropriate one for translation initiation, with LRP identifying two important biological features for translation: the integrity of Kozak sequence as well as the detrimental effects of premature stop codons.
translated by 谷歌翻译
Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.
translated by 谷歌翻译
人重新识别(RE-ID)旨在在相机网络中寻找感兴趣的人(查询)。在经典的重新设置中,查询查询在包含整个身体的正确裁剪图像的画廊中。最近,引入了实时重新ID设置,以更好地代表Re-ID的实际应用上下文。它包括在简短的视频中搜索查询,其中包含整个场景帧。最初的实时重新ID基线使用行人探测器来构建大型搜索库和经典的重新ID模型,以在画廊中找到查询。但是,产生的画廊太大,包含低质量的图像,从而降低了现场重新ID性能。在这里,我们提出了一种称为贸易的新现场重新ID方法,以产生较低的高质量画廊。贸易首先使用跟踪算法来识别画廊中同一个人的图像序列。随后,使用异常检测模型选择每个轨道的单个良好代表。贸易已在PRID-2011数据集的实时重新ID版本上进行了验证,并显示出比基线的显着改进。
translated by 谷歌翻译
人类能够以显着的敏捷性和轻松的方式谈判计划和计划外行为。本文的目的是系统地研究这种人类行为向两足步行机器人的翻译,即使形态本质上不同。具体而言,我们从计划和计划外的下台开始的人类数据开始。我们从人类减少阶层建模的角度分析了这些数据,编码质量(COM)运动学和接触力的中心,这使这些行为将这些行为转化为双皮德机器人的相应降低阶模型。我们通过基于非线性优化的控制器将所得的行为嵌入了两足机器人的全阶动力学中。最终结果是在不足的步行机器人上模拟中计划和计划外的下台。
translated by 谷歌翻译
随着机器学习(ML)在关键自主系统中的越来越多的使用,已经开发出运行时监视器来检测预测错误并使系统在操作过程中保持安全状态。已经提出了针对涉及各种感知任务和ML模型的不同应用,并将监视器进行了监视,并将特定的评估程序和指标用于不同的环境。本文介绍了三个统一面向安全的指标,代表了监视器的安全益处(安全增益),使用后的剩余安全差距(残留危险)以及对系统性能(可用性成本)的负面影响。要计算这些指标,需要定义两个返回功能,代表给定的ML预测如何影响预期的未来奖励和危害。三个用例(分类,无人机登陆和自动驾驶)用于证明如何根据建议的指标来表示文献的指标。这些示例的实验结果表明,不同的评估选择如何影响监视器的感知性能。由于我们的形式主义要求我们制定明确的安全假设,因此它使我们能够确保进行评估与高级系统要求符合。
translated by 谷歌翻译
我们介绍Cendernet,这是一个基于中心和曲率表示的多视图图像的6D姿势估计的框架。为反光,无纹理对象寻找精确的姿势是工业机器人技术的关键挑战。我们的方法包括三个阶段:首先,一个完全卷积的神经网络可预测每种观点的中心和曲率热图;其次,中心热图用于检测对象实例并找到其3D中心。第三,使用3D中心和曲率热图估算6D对象姿势。通过使用渲染和能力方法共同优化视图的姿势,我们的方法自然处理遮挡和对象对称性。我们表明,Cendernet在两个与行业相关的数据集上优于以前的方法:DIMO和T-less。
translated by 谷歌翻译
随着各种科学领域中数据的越来越多,生成模型在科学方法的每个步骤中都具有巨大的潜力来加速科学发现。他们最有价值的应用也许在于传统上提出假设最慢,最具挑战性的步骤。现在,正在从大量数据中学到强大的表示形式,以产生新的假设,这对从材料设计到药物发现的科学发现应用产生了重大影响。 GT4SD(https://github.com/gt4sd/gt4sd-core)是一个可扩展的开放源库,使科学家,开发人员和研究人员能够培训和使用科学发现中假设生成的最先进的生成模型。 GT4SD支持跨材料科学和药物发现的各种生成模型的用途,包括基于与目标蛋白,OMIC剖面,脚手架距离,结合能等性质的分子发现和设计。
translated by 谷歌翻译